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1  Technical  Abstract 

In  this  final  technical  report  for  the  phase  I  Air  Force  STTR  contract  FQ8671-9901279,  we  summarize  the  technical 
objectives,  work  accomplished,  results,  and  technical  feasibility.  The  primary  objective  of  this  research  is  to  develop 
methodology  and  software  for  the  analysis  of  atmospheric  turbulence  and  related  data.  Turbulence  data  are  challenging 
because  they  are  inherently  non-stationary  across  a  range  of  scales.  Because  the  discrete  wavelet  transform  is  a  natural 
tool  for  use  with  non-stationary  and  scale-dependent  data,  we  investigated  the  efficacy  of  a  variety  of  wavelet-based 
techniques  including  approximate  maximum  likelihood  and  least  squares  estimators.  These  estimators  were  adapted 
to  work  effectively  in  the  presence  of  (i)  slow  variations  in  the  power-law  parameters,  (ii)  large  scale  stochastic  trends 
and  (iii)  small  scale  non-turbulent  events.  The  statistical  properties  of  (most)  wavelet-based  power-law  parameter 
estimators  were  examined  and  confidence  intervals  created.  We  assessed  the  efficacy  of  power-law  models  (which  are 
proposed  capture  the  salient  features  in  actual  turbulence  measurements)  by  studying  the  variability  in  the  wavelet 
coefficients  at  particular  scales  and  comparing  it  to  the  variability  that  would  occur  if  a  time-varying  power-law  model 
were  an  adequate  description.  We  also  investigated  the  relevance  of  nonlinear  deterministic  modeling  of  wavelet  sub¬ 
bands  and  used  the  technique  for  short-term  prediction.  All  software  was  implemented  in  MathSoft’s  next  generation 
S+Wavelets  module. 

2  Introduction:  Team  Members  and  Phase  I  Publication  Summary 

2.1  Project  Members 

Principal  Investigator:  William  L.  B.  Constantine,  Research  Scientist 
Data  Analysis  Products  Division  of  MathSoft,  Inc.,  Seattle,  WA 

Dr.  Constantine  is  in  charge  of  all  aspects  of  the  project  including  software  design,  development  and  implementation. 
His  expertise  is  in  digital  signal  processing  (with  an  emphasis  in  wavelets)  and  nonlinear  dynamics. 

Investigator:  Donald  B.  Percival,  Senior  Mathematician 
Data  Analysis  Products  Division  of  MathSoft,  Inc.,  Seattle,  WA 

Dr.  Percival  is  an  integral  member  of  the  phase  I  team  and  is  an  expert  in  wavelet  theory,  time  series  modeling,  and 
spectral  analysis  techniques. 

Investigator:  Z.  Q.  John  Lu,  Research  Scientist 

Data  Analysis  Products  Division  of  MathSoft,  Inc.,  Seattle,  WA 

Dr.  Lu  is  in  charge  of  the  development  of  (novel)  deterministic  modeling  and  prediction  algorithms  for  the  phase  I 
contract.  His  principal  area  of  expertise  is  in  multivariate  nonlinear  regression,  nonlinear  time  series  analysis,  and 
statistical  modeling  of  fractal  geometry. 

Consultant:  Per  G.  Reinhall,  Professor 

Department  of  Mechanical  Engineering,  University  of  Washington,  Seattle 

Dr.  Reinhall  is  the  lead  investigator  for  the  phase  I  academic  partnership  with  the  University  of  Washington.  His 
area  of  expertise  is  in  nonlinear  dynamics  and  time  series  analysis,  modeling  of  biological  systems,  vibrations  and 
manufacturing. 

Consultant:  James  Bassingthwaighte,  Professor 

Department  of  Bioengineering  and  Biomathematics,  University  of  Washington,  Seattle 

Dr.  Bassingthwaighte  has  played  a  supporting  role  in  the  development  of  new  fractal  time  series  algorithms  and  is 
considered  an  expert  in  biomedical  fractal  time  series  analysis. 

Academic  Researcher:  Peter  Craigmile,  Ph.D.  Candidate 
Department  of  Statistics,  University  of  Washington,  Seattle 

Peter  has  played  a  pivotal  role  in  the  development  of  stochastic  modeling  techniques  for  fractal  (turbulence)  time 
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series.  His  Ph.D.  is  focused  on  stochastic  modeling  of  long  memory  processes. 

2.2  Published  Material 

We  made  significant  progress  in  our  phase  I  research  which  has  lead  to  the  following  publications: 

•  P.  Craigmile,  Percival,  D.B.  and  Guttorp,  P.  (2000),  “Wavelet-Based  Parameter  Estimation  for  Trend  Contaminated 
Fractionally  Differenced  Processes,”  Technical  Report  for  National  Research  Center  for  Statistics  and  the  Environ¬ 
ment,  University  of  Washington. 

•  P.  Craigmile,  Percival,  D.B.  and  Guttorp,  P.  (2000)  Decorrelation  Properties  of  Wavelet  Based  Estimators  for 
Fractionally  Differenced  Processes.  Proceedings  of  the  3rd  European  Conference  of  Mathematics:  Birkhauser  Verlag. 

•  P.  Craigmile,  Percival,  D.B.  and  Guttorp,  P.  (2000),  “Assessing  Non-linear  Trends  using  the  Discrete  Wavelet 
Transform,”  Technical  Report  for  National  Research  Center  for  Statistics  and  the  Environment,  University  of  Wash- 
ington. 

•  Peter  F.  Craigmile  (2000)  Wavelet  Based  Estimation  for  Trend  Contaminated  Long  Memory  Processes.  Ph.D. 
Dissertation:  University  of  Washington,  Department  of  Statistics. 

•  W.  Constantine,  D.B.  Percival,  P.G.  Reinhall,  (2000).  “Modeling  Aerothermal  Turbulence  using  Fractionally  Differ¬ 
enced  Processes,”  to  be  submitted  to  Physical  Review  E  after  approval  from  AFOSR  Public  Affairs. 

•  D.  B.  Percival  and  A.  T.  Walden  (2000)  Wavelet  Methods  for  Time  Series  Analysis.  Cambridge  University  Press. 
These  publications  were  fully  or  partially  supported  by  this  phase  I  contract. 


3  Technical  Objectives 

The  primary  phase  I  research  and  development  objectives  are  as  follows. 

1.  Review  of  models  for  non- stationary  and  multi-scale  turbulence  data.  Using  actual  turbulence  measurements,  we 
will  evaluate  the  relative  merits  of  three  models  for  handling  such  data.  The  first  is  a  stochastic  model  based 
directly  on  the  octave-band  decomposition  given  by  the  discrete  wavelet  transform  (DWT),  in  which  we  seek  to 
model  each  subband  series  in  the  decomposition  using  traditional  time  series  methods  (e.g.,  autoregressive  models 
and  generalizations  thereof).  The  second  is  also  a  stochastic  model  and  was  used  recently  by  Papanicolaou  et 
al.  [20).  This  model  treats  turbulence  as  a  time-evolving  power  law  process  whose  exponent  obeys  a  first  order 
autoregressive  model.  The  third  model  is  based  on  the  idea  of  representing  turbulence  as  a  deterministic  multi- 
scale  fractal  process. 

2.  Development  of  estimators  for  stochastic  and  deterministic  fractal  processes.  For  the  two  stochastic  models 
to  be  considered,  we  will  investigate  the  relative  merits  of  approximate  maximum  likelihood  and  least  squares 
estimators  of  the  relevant  model  parameters.  We  will  investigate  the  effect  on  parameter  estimation  due  to 
the  wavelet  that  is  chosen  to  form  the  DWT  (the  choice  of  wavelet  is  particularly  important  because,  while 
it  is  tempting  to  use  the  Haar  wavelet  because  of  its  inherent  simplicity,  we  have  found  in  working  with  other 
turbulence  data  [21]  that  the  Haar  wavelet  does  not  work  well  for  powers  laws  fa  with  a  <  -2.5  or  for  power  laws 
observed  in  the  presence  of  significant  large  scale  stochastic  or  deterministic  trends).  For  deterministic  chaotic 
modeling  of  turbulence  data,  we  will  assess  the  applicability  of  nonlinear  dynamic  measures  such  as  generalized 
fractal  dimensions,  Lyapunov  exponents  and  approximate  entropy. 

3.  Development  of  analysis  strategy.  We  will  develop  an  overall  strategy  for  analyzing  multi-scale  fractal  processes. 
This  strategy  will  help  guide  the  non-expert  in  their  work  and  will  influence  the  software  design. 

4.  Integration  of  methods  into  a  user-friendly  software  environment.  We  will  implement  the  software  needed  for 
model  evaluation  and  model  estimation  into  MathSoft’s  next  generation  S+ Wavelets  software. 
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4  Technical  Approach  and  Accomplished  Work 

The  last  three  decades  have  seen  a  rapid  advance  in  the  mathematical  modeling  of  turbulence  data.  Encouraged  partly 
by  the  fact  that  complex,  seemingly  random,  behavior  can  be  well  modeled  by  simple  low  dimensional  deterministic 
nonlinear  systems,  many  researchers  have  hypothesized  that  turbulence  can  be  modeled  using  chaos  theory.  Early 
experiments  in  Rayleigh-Bernard  thermal  convection  [19],  Taylor-Couette  flow  between  cylinders  [24],  closed  loop 
thermosiphons  [3],  turbulent  boundary  layers  for  open  flow  over  a  wall  [1],  and  surface  wave  propagation  in  a  saltwater 
medium  [12],  have  in  part  verified  this  hypothesis.  However,  there  is  a  lack  of  such  clear  proof  in  other  experiments 
and  in  data  collected  from  uncontrolled  environments  such  as  in  aerothermal  data.  More  recent  efforts  in  turbulence 
modeling  have  shown  chaos  theory  to  be  useful  in  interpreting  local  phenomenon  and  flow  stability.  Chaos  is  now 
generally  considered  to  have  an  important  (yet  limited)  role  in  the  modeling  of  turbulence  but  not  as  a  theory  capable 
of  describing  turbulent  flow  in  detail.  Even  if  the  turbulence  is  viewed  as  a  deterministic  event,  the  high  degrees  of 
freedom  (dimension)  of  the  flow  makes  the  use  of  chaos  theory  impractical.  Hence,  the  treatment  of  turbulence  as  a 
stochastic  process  prevails  and  (like  low  dimensional  chaotic  models)  is  well-matched  for  handling  a  prevalent  notion 
about  turbulence,  namely,  that  it  has  certain  ‘self-similar’  or  ‘fractal’  properties.  Loosely  speaking,  this  property  means 
that  certain  measures  of  turbulence  data  are  invariant  upon  rescaling  the  data,  but  the  measures  are  quite  different 
for  stochastic  and  deterministic  models  (e.g.,  invariance  in  distributional  properties  in  the  former  and  invariance  in 
space-filling  properties  in  the  latter).  Both  approaches  are  capable  of  generating  simulated  series  that  mimic  some 
properties  of  actual  turbulence,  but  there  is  much  work  yet  to  be  done  to  ascertain  which  class  of  models  or  combination 
thereof  is  the  best  to  use  to  answer  questions  of  practical  importance. 

Most  deterministic  and  stochastic  approaches  assume  homogeneity  in  time  across  all  scales  of  interest.  In  this 
report,  we  discuss  methods  that  can  be  used  for  turbulence  with  time-varying  properties.  As  we  show  below,  there 
is  strong  evidence  to  support  the  notion  that  turbulence  exhibits  time  varying  power  law  behavior  over  finite  ranges 
of  scale.  Because  of  the  temporally  localized  and  scale  dependent  nature  of  wavelet  transforms,  wavelet  techniques 
provide  a  natural  framework  for  the  analysis  of  physical  phenomena  that  exhibit  variations  across  time  and  within  a 
finite  range  of  scales.  This  is  a  departure  from  techniques  that  assume  a  priori  either  a  self-similar  structure  across 
all  scales  in  the  data  or  stationarity  in  fractal  measures  as  a  function  of  time.  While  a  wavelet  decomposition  of  a 
turbulence  time  series,  say  {Xt},  is  based  on  using  self-similar  analysis  tools  (i.e.,  wavelets),  it  does  not  make  an  a 
priori  assumption  that  {Xt}  is  evolving  in  a  self-similar  manner.  By  making  a  careful  study  of  each  scale  as  it  evolves 
in  time  and  of  the  relationships  of  the  scales  to  each  other,  we  can  then  evaluate  how  reasonable  it  is  to  use  models 
that  postulates  a  tight  coupling  across  scales,  e.g.,  a  time-evolving  power-law  processes. 

In  our  effort  to  address  these  issues,  we  use  a  stochastic  fractal  time  series  model  known  as  a  fractionally  differenced 
process  (FDP)  [4].  An  FDP  has  a  clear  advantage  over  similar  models  such  as  fractional  Brownian  motion  (fBm)  and 
fractional  Gaussian  noise  (fGn)  for  the  following  reasons: 

•  Unlimited  power  law  exponent  range.  Both  fBm  and  fGn  are  stochastic  power  law  processes  in  that  their 
spectral  density  functions  are  approximately  proportional  to  \f\a  where  a  is  the  power  law  exponent.  However, 
an  fBm  is  limited  to  an  exponent  range  of  —3  <  a  <  —1  while  a  fGn  is  limited  to  —1  <  a  <  1.  An  FDP  is  also 
a  stochastic  power  law  process,  but  it  has  no  such  limitation  on  its  exponent  range  and  is  theoretically  valid  for 
CK  £  K.. 

•  Model  continuity.  Because  fBm  and  fGn  jointly  cover  power  laws  ranging  from  —3  up  to  1  (which  is 
adequate  to  model  some  —  but  not  all  —  turbulent  phenomena),  it  is  tempting  to  select  between  fBm  and  fGn 
to  model  various  turbulent  series;  however,  at  a  =  —1  (which  is  known  as  1//,  pink,  or  flicker  noise),  there 
is  a  discontinuity  between  the  fGn  and  fBm  models  at  high  frequencies,  which  can  lead  to  problems  in  model 
selection.  Unfortunately,  many  real  world  phenomena  exhibit  1/f  noise  [2].  An  FDP  has  no  such  discontinuity. 

•  Tractable  SDF  AND  AC  VS.  In  contrast  to  the  fBm  and  fGn  models,  an  FDP  has  tractable  forms  for  both 
its  spectral  density  function  (SDF)  and  (when  stationary)  corresponding  auto  covariance  sequence  (ACVS). 

•  Model  flexibility.  Both  autoregressive  (AR)  and  moving  average  (MA)  components  can  be  added  to  an 
FDP  to  provide  more  flexibility  in  modeling  high  frequency  spectral  content.  High  end  frequencies  are  typically 
contaminated  by  exogenous  noise  sources,  and  thus  flexible  modeling  of  these  regions  is  appropriate.  The  fBm 
and  fGn  models  are  not  readily  amenable  to  such  additions  as  it  further  complicates  the  SDF  and  ACVS  functions. 

In  this  report,  we  use  recently  developed  wavelet  techniques  to  estimate  the  parameters  of  FDP  models  applied  to 
aerothermal  turbulence  data.  There  are  a  number  of  advantages  in  using  the  discrete  wavelet  transform  (DWT)  on 
turbulence  data: 
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•  Decomposition  BASED  on  SCALE.  Turbulence  is  known  to  exhibit  fluctuations  at  various  spatial  scales,  and 
hence  the  DWT  is  a  natural  analyzer. 

•  Decorrelation  of  time  series.  This  is  crucial  for  obtaining  viable  approximate  maximum  likelihood  esti¬ 
mates  of  FDP  parameters  [6]  (see  Sec.  5.3.3  for  details). 

•  Localized  time  and  scale  content.  Each  wavelet  coefficient  is  localized  in  time,  allowing  us  to  track  changes 
in  the  characteristics  of  a  time  series  at  a  particular  scale  as  a  function  of  time. 

•  Separation  of  nonlinear  trends  from  noise.  The  wavelet  coefficients  are  inherently  “blind”  (invariant) 
to  nonlinear  polynomial  trend  contamination  in  the  original  time  series  [8,  7]. 

4.1  Theoretical  Developments  and  Background 

Here  we  develop  the  background  and  theory  for  the  models  and  corresponding  estimators  used  to  analyze  aerothermal 
turbulence  data. 


4.1.1  Fractionally  Differenced  Processes 

An  FDP  is  used  to  model  data  with  a  slowly  decaying  autocovariance  function,  a  hallmark  of  long  memory  dependence. 
The  process  was  originally  proposed  by  Granger  and  Joyeux  [10]  and  Hosking  [11]  as  an  extension  to  ARIMA(0,  J,  0) 
models  to  allow4for  fractional  values  of  S. 


Definition  2.1  Let  6  €  R  and  a2  >  0.  We  say  that  {Xt}te z  is  an  FDP(6,cr2)  if  it  has  a  spectral  density  func¬ 
tion 


Sx(f)  = 


a 


2 

€ 


|2sin  (7t/)|2<s  ’ 


I/I  <  1/2, 


(1) 


where  a2  is  the  innovation  variance,  and  5  is  the  fractional  difference  parameter. 

When  -1/2  <  6  <  1/2,  an  FDP  is  stationary  with  autocovariance  sequence 

_  g;r(i^M)r(T  +  f)  (2) 

,T  7tP(t  +  1  —  6) 

By  inspection  of  Eq.  1,  an  FDP(<5,  a2)  process  approximately  obeys  a  colored  noise  process  (Sx  oc  |/|Q)  at  low 
frequencies  with  a  =  -28  (the  error  in  this  approximation  is  quite  small  for  |/|  <  1/8).  This  is  an  excellent  model 
for  data  exhibiting  power  law  behavior  in  both  the  stationary  (8  <  1/2)  and  nonstationary  (8  >  1/2)  regimes.  For 
8  >  1/2  in  Eq.  1,  we  obtain  a  class  of  nonstationary  FDP  that  is  stationary  if  {Xt}  is  differenced  [5  +  1/2J  times. 
More  generally,  an  FDP  is  closed  under  differencing  operations,  i.e.  an  FDP(<5,  of)  that  has  been  subjected  to  a  Bl 
order  differencing  operation,  yields  an  FDP(<5  -  B,o2).  A  pure  power  law  process  subjected  to  the  same  differencing 
operation  will  not  yield  another  pure  power  law  process,  thus  making  an  FDP  a  more  flexible  and  robust  model. 


4.1.2  The  Discrete  Wavelet  Transform 

Consider  a  uniformly  sampled  time  series  {Xf}^1  with  N  divisible  by  2J  for  J  €  N.  For  L  an  even  positive  integer, 
let  t>e  a  Daubechies  [9]  wavelet  filter  with  squared  gain  function 

«,,!</)  ^  2sin >/)  Xf  (i/2  "  1  +  ')  cos ?'(„/).  (3) 

Equation  3  does  not  uniquely  define  a  wavelet  filter,  and  an  additional  phase  criterion,  such  as  extremal  or  least 
asymmetric  phase,  must  be  imposed  to  do  so.  Let  {gi}i~0l  be  a  scaling  filter,  defined  by  the  quadrature  mirror  filter 
(QMF)  relation 

91  =  (-l)'+1/lL-l-i-  (4) 
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(5) 


The  squared  gain  function  for  a  Daubechies  scaling  filter  is  given  by 

L /2 _ 1 

Qi,Uf)  =  2cosl(tt/)  J2  (^2  i 1  +  sin2W)- 

The  wavelet  and  scaling  filter  are  used  in  a  “pyramid”  algorithm  [18]  to  transform  {Xt}  into  a  collection  of  wavelet 
coefficients  Wjit  and  scaling  coefficients  Vht  that  can  be  grouped  by  physical  scale  7j  =  2]~1At  for  j  —  1, . . .  ,  J,  where 
At  is  the  sampling  time  between  contiguous  observations  in  {Xt}  (for  simplicity,  we  set  At  to  unity  throughout  this 
report  for  the  mathematical  derivation  of  FDP  parameter  estimators).  The  collection  of  coefficients  W,  given  by 

W  =  (WllW2 . Wj.Vj)  (6) 

where  Wj  are  the  Nj  =  N/2j  wavelet  coefficients  and  Vj  are  the  N/2J  scaling  coefficients,  represents  the  discrete 
wavelet  transform  of  {Xt}.  Implementation  of  the  DWT  begins  by  defining  the  zeroth  level  scaling  coefficients  to  be 
the  original  time  series:  Vo,t  =  {Xt}^1.  The  level  j  wavelet  coefficients  Wjit  and  scaling  coefficients  Vjit  are  then 
formed  by 

L- 1  L- 1 

Wj.t  =  hjVj- l,2t+l— l  mod jVj-i  and  Vj,t  =  <7t^j-l,2t+l-t  modNj-i  (7) 

1=0  1=0 

where  t  =  0, . . .  The  pyramid  algorithm  represented  by  Eq.  7  can  also  be  interpreted  as  a  series  of  cascade  filter 

bank  operations  since  the  wavelet  filter  used  to  create  the  Wj, t  is  an  approximate  bandpass  filter  with  nominal  pass- 
band  /  G  [l/47j,  1/2 Tj]  while  the  corresponding  scaling  filter  used  to  create  the  Vj,t  is  a  low  pass  filter  with  nominal 
pass-band  /  G  [0,  l/4Tj].  Figure  1  shows  the  squared  gain  responses  for  a  20-tap  Daubechies  wavelet  filter  Hj^oif) 
and  scaling  filter  Qj, 20(f)  for  j  =  1, ...  ,4  and  illustrates  the  filter  bank  perspective.  When  considering  the  statistical 


Figure  1:  The  squared  gain  functions  for  Daubechies  least  asymmetric  20-tap  wavelet  filter  for  levels  j  =  1, . . .  ,4.  For 
simplicity,  the  sampling  period  was  set  to  unity  to  create  the  frequency  axis  and  establishes  the  Nyquist  frequency  at 
1/2.  The  dotted  vertical  lines  identify  the  octave  bands  over  which  the  wavelet  and  scaling  filters  are  associated.  The 
scaling  of  the  left  (right)  ordinate  is  representative  of  the  DWT  (MODWT)  squared  gain  function. 

properties  of  DWT  coefficients,  it  is  useful  to  divide  the  wavelet  and  scaling  coefficients  into  boundary  and  interior 
coefficients.  Boundary  coefficients  are  those  subject  to  change  if  the  ‘mod’  operator  were  to  be  dropped  in  Eq.  7. 
These  boundary  coefficients  must  be  ignored  (for  example)  when  calculating  unbiased  wavelet  variance  estimates  ^see 
Sec.  4.1.5  for  details).  The  number  of  DWT  boundary  coefficients  is  given  by  min{Lj  ,Nj}  where  Lj  =  f(L-2)(l-2  J)] 
(for  large  j,  Lj  =  L  -  2).  The  remaining  DWT  coefficients,  of  which  there  are  Mj  =  Nj  -  min {Lj,iVj},  make  up 
the  set  of  interior  coefficients.  Figure  2  shows  a  DWT  transform  of  a  small  segment  of  aerothermal  data.  A  physical 
interpretation  of  the  DWT  based  upon  Daubechies’  class  of  compactly  supported  wavelet  filters  is  that  the  Wj,t  measure 
the  difference  (centered  at  a  particular  time)  between  adjacent  weighted  averages  of  {Xt}  at  scale  r:.  Large  values 
for  the  Wj,t  indicate  that  {Xf}  tends  to  have  large  variations  over  time  scales  of  length  Tj.  Similar  to  the  wavelet 
coefficients,  the  scaling  coefficients  are  weighted  averages  of  {X(}  on  a  scale  of  Tj. 
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Figure  2:  DWT  of  aerothermal  data  segment  using  Daubechies  8-tap  least  asymmetric  filters.  The  number  in  the  curly 
brackets  next  to  each  subband  represents  the  amount  of  circular  shift  imposed  to  adjust  the  coefficients  to  approximate 
zero  phase.  A  negative  shift  value  implies  an  advance,  or  left  circular  shift,  of  the  coefficients. 


4.1.3  Maximum  Overlap  Discrete  Wavelet  Transform  (MODWT) 

Despite  its  popularity,  the  DWT  has  a  few  practical  limitations: 

•  Dyadic  LENGTH  REQUIREMENT:  The  DWT  can  be  adapted  to  accommodate  arbitrary  length  sequences  via 
polynomial  extensions  of  the  scaling  coefficients.  However,  selecting  an  appropriate  number  of  end  points  to  fit 
or  the  order  of  fit  is  not  a  trivial  task.  Other,  less  complicated  techniques  may  be  used,  but  generally  involve 
either  a  lot  of  bookkeeping  or  are  too  simple  to  accurately  portray  the  dynamics  of  the  scaling  coefficients. 

•  Lack  of  shift  invariance:  The  decimation  operation  makes  the  DWT  a  non  shift-invariant  transform. 

•  Inappropriate  for  instantaneous  time  dependent  measures.  The  DWT  coefficients  represent  too  coarse 
of  a  “slicing”  in  the  time  domain  to  provide  instantaneous  FDP  parameter  estimates. 

As  an  alternative,  we  can  use  a  nondecimated  form  of  the  DWT,  known  as  the  maximum  overlap  DWT  (MODWT) 
that  has  two  main  advantages:  (1)  it  handles  arbitrary  length  sequences  inherently  and  (2)  the  MODWT  coefficients 
can  be  circularly  shifted  to  form  an  approximate  zero  phase  projection  of  {Xt}  if  the  wavelet  filter  is  approximate  linear 
phase  (as  is  the  case  for  Daubechies  least  asymmetric  and  Coiflet  filters).  Additionally,  the  number  of  coefficients  in 
each  scale  is  equal  to  the  number  of  points  in  the  original  time  series.  This  refined  slicing  of  the  data  in  combination 
with  the  zero  phase  property  allows  us  to  calculate  instantaneous  statistical  measures  of  the  data  across  scale. 

As  in  the  DWT,  implementation  of  the  MODWT  begins  by  defining  the  zeroth  level  scaling  coefficients  to  be  the 
original  timejseries:  V0,t  =  Let  hi  s  ft«/V5and  g  =  gi/V 2  for  l  =  0, . . .  ,  L  -  1.  The  MODWT  wavelet 

coefficients  Wj,t  and  corresponding  scaling  coefficients  Vj,(  are  formed  by 

L— 1 _  L-l 

Wj}t  =  mod N  and  Vjft  =  giVj-l,t-2*-1l  modN 

i=0  1=0 

where  j  =  1, . . .  ,  J  and  t  =  0, . . .  ,  N  -  1.  The  collection  of  coefficients  W,  given  by 

W  =  (W1,Wa,...>Wj1V./)  '  (9) 

where  W_,  are  the  N  wavelet  coefficients  at  scale  r,  and  Vj  are  the  N  scaling  coefficients  at  scale  tj,  represents  the 
MODWT  of  {A"(}.  The  number  of  boundary  coefficients  is  Lj  =  (2J  —  1)(L  —  1)  +  1.  Figure  3  shows  a  MODWT  of 
aerothermal  data  using  Daubechies  8-tap  least  asymmetric  filters. 
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(10) 


If  the  sample  size  N  is  a  power  of  two,  the  MODWT  coefficients  and  DWT  coefficients  are  related  by 

Wj,t  =  2*/2WJl2i(t+1)-i. 

The  DWT  can  thus  be  seen  as  a  scaled  and  subsampled  version  of  the  MODWT.  This  relation  can  be  visualized,  Jor 
example,  by  comparing  the  DWT  scaling  coefficients  14,  t  in  Fig.  2  with  the  corresponding  MODWT^coefficients  14, t 
in  Fig.  3.  Using  Eq.  10,  the  MODWT  squared  gain  functions  are  defined  as  ?f(/)  =  2 and  Q(f)  =  2 ~3Q{f) 
(see  Fig.  1). 
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Figure  3:  MODWT  of  aerothermal  data  segment  using  Daubechies  8-tap  least  asymmetric  filters. 


4.1.4  Time  independent  wavelet  variance 

4.1.5  MODWT  wavelet  variance 

Since  the  SDF  for  the  MODWT  interior  wavelet  coefficients  is  given  by  Uj,L(f)  Sx(f),  the  variance  of  var{Wjit}  can 
be  expressed  as 

__  r1/2 

var {Wj,t}=  HjMSxV)  df-  (n) 

J  —  1/2 


Using  the  approximation  that  Hj<L{f)  is  an  ideal  bandpass  filter  over  |/|  €  [l/22+1, 1/22']  and  taking  into  consideration 
the  even  symmetry  of  SDFs,  an  approximation  to  the  wavelet  Variance  is  given  by 

_  fl/2* 

var{Wj,t} « 2  /  Sx(f)df.  (12) 

J 1/23  +  1 


For  fractional  difference  processes,  we  have 


var {Wj,t}  «  2 


r'/V 
J 1/2J+1 


h/2S+ >  |2  sin  (vr/)[2rf 
When  j  >  3,  so  that  sinrr /  «  irf,  Eq.  13  can  be  approximated  by 

var{W,-,t}  «  a\  c(<5)  rf~\ 


df. 


(13) 


(14) 


where  c(S)  =  -  22<5_1)/(1  -  25).  Equation  14  suggests  that  a  direct  means  of  estimating  5  is  to  fit  ajeast 

squares  line  to  the  log  of  the  estimated  wavelet  variance.  The  slope  of  the  line,  say  (3,  that  best  fits  log(var{ Wjtt}) 
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versus  log(rj)  in. a  least  squares  sense  is  related  to  the  FDP  parameter  by  S  =  ((3+l)/2  and  the  pure  power  law  (PPL) 
exponent  by  a  =  —  (/3  4- 1)* 

For  finite  length  time  series,  MODWT-based  estimates  of  the  wavelet  variance  are  given  by 


and 


Biased:  %  =  jj  £ Wl' 

t- 0 


(15) 


N-l 


Unbiased:  v\  =  V 
»>#■  ^ > 


W. 


Mj  ~ 

J  t=Lj-l 


(16) 


where  Mj  =  N  -  Zj  +  1  is  the  number  of  MODWT  interior  wavelet  coefficients.  As  a  caveat,  it  should  be  noted  that 
the  wavelet  variance  estimates  are  somewhat  sensitive  to  the  order  L  of  the  wavelet  filter  used  in  the  analysis.  In 
particular,  studies  by  one  of  us  [22]  have  shown  that  there  can  be  a  significant  bias  in  the  slope  of  the  (2-tap)  Haar 
wavelet  variance  estimates  in  log-log  space  due  to  a  spectral  leakage  phenomenom.  The  leakage  is  attenuated  as  the 
filter  order  is  increased,  and  the  wavelet  variance  estimates  typically  stabilize  for  L  >  8,  as  is  true  for  all  analyses 
presented  in  this  report. 


4.2  Development  of  fully-functional  commercial-grade  software 

4.2.1  Enhancement  of  MathSoft’s  S+Wavelets  module 

We  have  developed  wavelet-based  software  written  entirely  in  MathSoft’s  S-PLUS  language  to  estimate  parameters  of 
a  fractional  difference  (FD)  process  as  a  model  for  turbulence  data.  These  functions  will  be  incorporated  directly  into 
the  S+Wavelets  module  built  for  S-Plus  on  UNIX  platforms.  The  main  functions  developed  thus  far  for  the  phase 
I  contract  are: 

•  wavFDPMLE:  Maximum  likelihood  (ML)  estimation  of  FD  parameters.  This  function  optimizes  a  maximum 
likelihood  functional  to  provide  estimates  of  the  FD  parameter  6  and  innovation  variance.  The  ML  functional  is 
approximated  using  discrete  wavelet  transform  (DWT)  coefficients  and  an  internally  defined  FD  spectral  density 
function.  This  function  has  the  ability  to  operate  in  detrending  mode  (nonlinear  and  nonstationary  trends  in 
the  data  are  excluded)  and  recentering  mode  (the  sample  mean  is  removed  from  the  data  as  a  preprocessing 
measure).  The  user  also  has  the  option  to  choose  a  faster  but  slightly  less  accurate  means  of  approximation 
(numerical  integration  of  the  spectral  density  function  (SDF)  is  replaced  by  a  band  pass  approximation  scheme). 
Finally,  the  user  is  allowed  to  select  the  wavelet  subbands  over  which  the  estimates  are  made. 

•  wavFDPMLETime:  Time  dependent  ML  estimation  of  the  FD  parameter  S .  This  function  calculates  the  local 
ML  estimate  of  S  using  MODWT  coefficients.  Biased  and  unbiased  versions  are  available. 

•  wavFDPWLSE:  Weighted  least  squares  estimate  (WLSE)  of  the  FD  parameter  6.  This  function  calculates  the 
weighted  least  squares  power  law  exponent  of  MODWT  -wavelet  variance  estimates  and  converts  it  to  6.  The 
weights  are  calculated  based  on  an  assumption  of  a  chi-square  distribution.  Biased  and  unbiased  versions  are 
available.  The  variance  of  the  estimated  5  is  also  returned. 

•  wavFDPLSETime:  Time  dependent  least  squares  estimate  of  the  FD  parameter  6 .  This  function  calculates  the 
least  squares  instantaneous  power  law  exponent  of  MODWT  wavelet  variance  data  and  converts  it  to  S.  Biased 
and  unbiased  versions  are  available.  The  variance  of  the  instantaneous  S  estimates  is  also  returned. 

In  addition  to  the  above  functions,  the  following  helper  functions  were  developed  explicitly  for  (or  greatly  enhanced 
as  a  result  of)  the  current  phase  I  contract: 

•  wavBoundary:  wavelet  transform  boundary  coefficient  separation. 

•  WAVCOVARIANCE:  time  (in)dependent  (un)biased  wavelet  covariance  estimation. 

•  wavFDPBand:  mid-octave  spectral  density  function  evaluation  using  band-pass  approximation. 

•  vvavFDPSDF:  FDP  spectral  density  function  definition. 
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•  wavFDPTimeInpuT:  preprocessing  function  for  time  dependent  FDP  parameter  estimation  functions. 

•  WAVGAIN:  general  wavelet  squared  gain  function  development. 

•  wavGainDaubechieS:  Daubechies  wavelet  squared  gain  response  function. 

•  wavHomO:  homogeneity  of  variance  test. 

•  WAVlNDEX:  identification  of  wavelet  boundary  coefficients  and  zero-phase  shift  factors. 

•  wavMODWT:  maximum  overlap  discrete  wavelet  transform. 

•  wavMODWTLonG:  efficient  MODWT  for  “large-scale”  time  series. 

•  wavPermute:  permutation  function  for  vector  sequences. 

•  wavShift:  wavelet  zero  phase  shift  function. 

•  wavTitle:  extraction  of  series  name  for  multiple  classes. 

•  wav  Variance:  time  (in)dependent  (un)  biased  wavelet  variance  estimation. 

•  wavVarianceInput:  preprocessor  for  wavelet  variance  function. 

•  wavZeroPhase:  calculation  of  zero  phase  shift  indices  for  Daubechies’  Coiflet  and  symmlet  filters. 

The  functions  are  designed  to  take  advantage  of  the  object  oriented  nature  of  the  S-Plus  language.  Each  of  the 
functions  listed  above  can  accept  input  from  a  variety  of  classes  including 

•  DWT:  Discrete  wavelet  transform  class. 

•  MODWT:  Maximum  overlap  discrete  wavelet  transform  class. 

•  MODWTLONG:  Maximum  overlap  discrete  wavelet  transform  class  (long  time  series  form). 

•  WAVEBOUND:  Class  containing  wavelet  transform  coefficients  separated  into  boundary  and  interior  coefficients. 
The  boundary  coefficients  are  those  subject  to  circular  filtering  operations  and  are  excluded  for  unbiased  and 
detrended  FD  parameter  estimates.  This  class  handles  both  DWT  and  MODWT  class  data 

•  WAVECOV:  Wavelet  covariance  class. 

•  WAV  EVA  R:  Wavelet  variance  class. 

•  TS:  The  S-Plus  language  time  series  class. 

•  signalSeries:  The  S-Plus  language  signal  series  class. 

•  NUMERIC:  A  vector  of  numeric  data. 

New  classes  are  being  developed  for  the  functions  described  as  well  as  plot,  print,  and  summary  methods  which  uniquely 
display  the  data  in  a  way  that  is  most  relevant  to  the  researcher. 

5  Results 

5.1  Aerothermal  data  collection 

In  this  section  we  examine  a  uniformly  sampled  7.5  million  point  aerothermal  turbulence  data  set.  Aerothermal 
data  is  a  temperature  (related)  time  series  gathered  by  an  aircraft  flying  at  a  constant  elevation  and  constant  speed. 
The  measurement  system  is  a  cold-wire  probe,  externally  attached  to  the  aircraft,  that  senses  fluctuations  in  local 
temperature  by  means  of  a  proportional  change  in  wire  current.  The  data  spans  a  total  distance  of  137.3  km  with  a 
spatial  resolution  of  approximately  1.83  cm.  For  time  dependent  estimates  of  the  FDP  parameters,  this  large  data  set 
was  divided  into  contiguous  nonoverlapping  10, 000  point  blocks  and  the  series  in  each  block  analyzed.  Figure  4  shows 
the  aerothermal  data  smoothed  with  a  moving  average  filter. 
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Figure  4:  Moving  average  of  the  aerothermal  data  taken  over  100  point  nonoverlapping  blocks. 

5.2  Wavelet  variance  estimates 

A  time  dependent  wavelet  variance  estimator  was  used  to  assess  the  validity  of  power  law  scaling  in  turbulence  data. 
Figure  5  shows  an  example  of  unbiased  wavelet  variance  estimates  for  one  10,  000  point  block  of  aerothermal  data. 
Multiscale  linear  trends  in  wavelet  variance  plots  suggest  the  presence  of  spectral  power  law  behavior.  However,  these 
trends  appear  only  over  finite  ranges  of  scale.  For  the  example  shown  in  Figure  5,  a  different  power  law  behavior  is 
seen  over  scales  n  -  r4,  r5  -  r7,  and  r8  -  no-  These  patterns  change  with  different  blocks,  indicating  time  varying 
power  law  behavior. 


iog(  t.  ) 


Figure  5:  The  unbiased  MODWT  wavelet  variance  v\{rj)  of  sample  aerothermal  data  using  Daubechies  least  asym¬ 
metric  8-tap  wavelet  filters.  The  confidence  intervals  are  based  on  a  chi-square  distribution  assumption  where  the 
equivalent  degrees  of  freedom  (EDOF)  are  calculated  (1)  using  a  large  sample  approximation  to  the  mean  and  variance 
of  v2x(rj)  for  scales  n,...  ,r5  and  (2)  under  an  assumption  that  the  spectral  density  function  is  flat  over  the  nomi¬ 
nal  passbands  in  which  the  wavelet  coefficients  are  associated  for  rg, . . .  ,tio.  See  [22]  for  details  of  wavelet  variance 
confidence  intervals  and  their  development. 

To  better  illustrate  these  pattern  fluctuations,  scatter  plots  of  the  wavelet  variance  estimates  are  shown  in  the  lower 
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triangle  of  Fig.  6.  We  define  VjiP  =  \og(V2x  p(r,))  for  j  =  1, ....  9,  with  the  index  p  =  1, . . .  ,  P  representing  the  pth 
10, 000  point  nonoverlapping  block  in  a  uniform  partition  of  the  7.5  million  point  aerothermal  time  series  described  in 
Sec.  5.1.  A  scatter  plot  $_,,*  is  produced  by  plotting  VjiP  versus  Vfc,p  for  j  ^  k.  Inspection  of  Fig.  6  reveals  a  gradual 
spread  in  the  wavelet  variance  estimates  as  we  transverse  scale  from  'I' i ,2  to  $8,9  off  the  main  diagonal.  This  spread 
is  proportional  to  the  variation  in  6  over  time.  We  can  better  characterize  the  spread  in  the  scatter  plots  under  the 
hypothesis  that  the  var{Wj,t}  obeys  a  FDP(<5,  cj).  Taking  the  logarithm  of  Eq.  14,  we  obtain 

Vj  *  log(a£2)  +  log(c(*))  +  ( j  -  1)(25  -  1),  (17) 

that  can  be  further  simplified  since  log(c(<5))  closely  follows  the  tri-linear  curve 

{-1.60 -2.90  <5  if  <5  <  — 1, 

-0.95  -  2.30  S  if  -1  <  <5  <  2,  (18) 

-2.10  -  1.70  <5  if  5  >  2. 

As  most  physical  power  law  processes  are  in  the  exponent  range  of  -4  <  a  <  2  (corresponding  to  -1  <  6  <  2),  we 
can  use  a  least  squares  linear  approximation  of  log(c(<5))  to  reduce  Eq.  17  to 

Vj  «  [log(cr2)  +  0.05  -  j]  +  2 (j  -  2.15)  6 

=  nij+bjSj  (19) 

This  approximation  of  Vj  yields  insight  into  the  clustering  of  the  data  in  the  scatter  plots.  Any  variation  of  the 
innovation  variance  or  FDP  parameter,  for  example,  will  cause  estimates  of  Vj  to  migrate  in  the  scatter  plots.  Hypo¬ 
thetically,  if  T7Tj  «  rtik  is  constant  and  bj  =  bk  is  allowed  to  vary  over  time,  the  points  in  the  scatter  plot  will  migrate 
at  a  slope  bk/bj  =  (k  -  2.15) /(j  -  2.15).  This  behavior  can  be  seen  for  example  in  $6,7,  $7,8  and  $8,9  of  Fig.  6.  The 
lines  in  the  lower  triangle  of  Fig.  6  are  drawn  for  reference  with  a  slope  of  bk/bj  (under  an  FDP  model,  these  lines 
are  valid  only  for  j,k  >3).  If  bj  grows  at  a  different  linear  rate  than  does  bk  for  a  fixed  innovation  variance,  then 
the  scatter  plot  will  follow  a  curved  path.  These  curved  paths  are  particularly  evident  in  $5,6  and  $5,7-  Finally,  if 
neither  vij  nor  bj  vary  much  over  time,  the  result  is  a  small  cluster  of  points  in  the  scatter  plot.  This  tightly  coupled 
clustering  can  be  seen,  for  example,  in  the  upper  triangle  of  Fig.  6  covering  scales  t\  -  r4.  An  important  implication  of 
the  scatter  plots  is  to  show  that  no  single  FDP  can  adequately  model  the  turbulence  data  over  all  scales.  Secondly,  it 
is  apparent  that  in  many  of  the  scatter  plots  for  adjacent  scales  ($j,j±i)  there  is  a  convergence  towards  a  linear  trend 
at  slope  bj/bj±i,  which  is  consistent  with  a  power  law  coupling  across  those  scales. 


Figure  6:  Scatter  plots  and  differential  scatter  plots  of  log(i 'x,b(Ti))  estimates  for  j  =  1, ...  ,9  and  p  =  1, ...  ,  420.  For 
purposes  of  comparison,  all  axes  are  uniformly  scaled.  See  text  for  details. 

We  can  also  calculate  “differential  plots”  f ij>k  to  gain  insight  on  the  coupling  and  variability  of  the  estimated  slope 
of  the  Vj  relative  to  that  of  the  Vfc.  Define  %p  =  Vj+1,p  -  Vj,p  to  be  a  gross  approximation  of  the  (log)  wavelet 


13 


variance  slope  between  scales  Tj  and  Tj+ i  for  j  =  1, . . .  ,  J  —  1  in  the  data  block  for  p  —  1, . . .  ,  P.  The  differential 
scatter  plot  Sljik  is  produced  by  plotting  the  statistic  djtk,P  =  %P/DfctP  -  1  versus  the  block  number  p.  The  Qjtk 
qualitatively  track  the  coupling  of  the  slopes  D j  and  3k  as  the  block  number  (time)  varies  and  are  plotted  in  the  upper 
triangle  of  Fig.  6.  If  djtktP  =  0,  then  the  slopes  are  tightly  coupled  and  suggests  that  a  power  law  model  is  appropriate 
over  the  corresponding  scales.  An  example  of  this  behavior  is  seen  in  plots  Oi,2,  fli,3  and  ^2,3  as  they  all  fall  close  to 
the  reference  zero  line.  If  djfktP  ±  0  consistently  over  all  blocks,  then  the  two  slopes  are  not  consistent  with  a  power  law 
relationship.  This  behavior  is  seen,  for  example,  in  plots  and  fij, 5  for  j  —  1, ...  ,3.  To  quantify  the  distribution 
of  the  data  in  the  Q,j>k,  each  plot  is  overlaid  with  a  boxplot  whose  thick-lined  center  represents  the  mean  while  the  top 
and  bottom  of  the  box  spans  an  estimate  of  2  sample  standard  deviations  about  the  mean. 

Using  the  results  shown  in  Fig.  5  and  6,  we  propose  to  fit  separate  FDP  models  to  the  aerothermal  turbulence 
data  over  4  finite  ranges  of  scales:  t\  —  73,  7*2  —  t4,  t 5  —  77,  and  7$  —  The  overlap  of  the  first  two  scale  ranges  is 
purposely  set  to  explore  the  apparent  periodicities  appearing  in  scales  r3  and  r4  (see  ft2,3  for  example  in  Fig.  6).  In 
what  follows,  we  discuss  wavelet-based  time  independent  and  dependent  estimates  for  the  FDP  parameters  via  least 
squares  and  maximum  likelihood  techniques. 


5.3  Estimating  FDP  parameters  with  wavelets 

5.3.1  Block  dependent  weighted  least  squares  estimator 

Here  we  develop  a  weighted  least  squares  estimator  for  6  based  upon  the  unbiased  MODWT-based  wavelet  variance 
estimator  at  scales  Tr  The  distribution  for  v\{r5)  is  approximately  that  of  a  chi-squared  random  variable 

(rj) A/ji  where  rjj  is  the  equivalent  degree  of  freedom  (Sec.  8.4  of  [22]  discusses  three  methods  for  determining 


77 j,  the  simplest  of  which  is  to  set  r]j  =  max{Mj/2J,  1}).  Define 


Y{rj)  =  l°g(i>x(Ti))  “V'  T  J  +  log 


(?) 


Vj 


(20) 


where  t />(•)  is  the  digamma  function.  We  can  now  formulate  a  linear  regression  model  Y(rj)  =7  +  /31og(ri)  +  ej  where 
e  =  \og{i'x(Ti)/l/x(Tj))  ~  +  log  (jjj /2)  is  an  error  term  with  zero  mean  and  variance  ip'(r}j/2)  [%!>'{ •)  is  the 

trigamma  function)  [22].  The  weighted  least  squares  estimate  (WLSE)  of  the  slope  term  (5  is 


A  logfomr,-)  -  log (r,-) 

l  wlse  ~  log2 (Tj)  -  (E"j  log(^j))2 


(21) 


where  uj:  =  [V)'(^)]_1-  All  sums  in  Eq.  21  are  over  j  =  The  weighted  least  squares  estimate  of  the  FDP 

parameter  is  then 


$wlse  —  2  ifiwlse  ^)- 


Furthermore,  the  variance  of  f3wise  is 

var{(3wise}  = 


Ewi 


E  E  ^3  log2  (tj  )  -  (E  u3  i°g(rj))2 

and  thus  the  variance  of  the  6wlse  estimate  is  found  by 


(22) 


(23) 


uar{(Eise}  =  -var0wise}- 


(24) 


Figures  7  and  8  show  the  weighted  least  squares  estimates  of  a  for  the  segmented  turbulence  data  set  described 
in  Sec.  5.1.  The  power  law  exponent  was  estimated  over  finite  ranges  of  scale,  commensurate  with  those  deemed 
appropriate  by  inspection  of  the  wavelet  variance  estimates  (Fig.  5  for  example)  and  corresponding  scatter  plots 
(Fig.  6).  Due  to  the  sampling  variability  present  in  the  wavelet  variance  estimates,  we  smoothed  all  &wise  for  scales  Tj 
such  that  j  >  5  with  a  moving  average  20  point  window  with  a  19  point  overlap.  For  reference  and  simplicity,  we  define 
the  term  q,  k  to  mean  the  estimated  power  law  exponent  over  scales  rJ,rj+1,...  ,rfc.  Figure  7  shows  the  estimates 
Qi,3,  <*2,4  and  q5,7.  An  immediate  observation  of  these  results  is  the  apparent  wide  range  of  a  for  various  scales, 
spanning  stationary  blue  noise  to  nonstationary  red  noise.  This  clearly  suggests  that  a  single  (Kolmogorov)  exponent 
is  not  an  adequate  description  of  this  aerothermal  turbulence  data.  Also  apparent  is  a  strong  coupling  between  di,3 
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and  a2,4,  which  share  a  similar  baseline  value  falling  between  the  theoretical  Kolmogorov  exponent  (a  =  —5/3)  and  a 
random  walk  (a  =  -2),  but  differ  greatly  in  the  periodicities  found  in  a2,4  (see  also  the  ft2,3  plot  of  Fig.  6).  These 
periodicities  are  suspected  to  be  due  to  an  exogenous  factor  unrelated  to  aerothermal  turbulence  such  as  a  periodic 
autopilot  correction,  inducing  a  local  vibration  (and  corresponding  recorded  temperature  fluctuation)  in  the  cold  wire 
probe  instrumentation.  For  a  relatively  small  segment  of  data  near  5.5  minutes,  there  is  a  close  convergence  of  the 
awi,e  indicating  the  presence  of  a  PPL  persistent  over  many  scales.  Figure  8  shows  the  long  scale  estimates  d8,10 
along  with  q 5  7  for  purposes  of  comparison.  Here  we  see  that  the  long  scale  trends  are  only  moderately  coupled, 
suggesting  once  again  that  different  power  laws  govern  different  ranges  of  scale  and  that  the  power  law  exponent  is 
time  dependent. 


Time  (min) 


Figure  7:  Weighted  least  squares  estimates  of  the  pure  power  law  exponent  a  over  scales  n  -  r7  for  the  turbulence 
data. 


Time  (min) 


Figure  8:  Weighted  least  squares  estimates  of  a  over  scales  T5  —  rio  for  the  turbulence  data. 
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5.3.2  Block  independent  least  squares  estimator 

To  form  instantaneous  least  squares  estimates  of  S  we  use  only  a  single  wavelet  coefficient  from  each  scale,  i.e.,  we  use 
W  t-  where  tj  is  the  time  index  of  the  jth  level  MODWT  coefficient  associated  with  time  t  in  {Xt)t=oy  The  time 
index  tj  can  be  meaningfully  determined  only  if  (approximate)  linear  phase  wavelet  filters  are  used.  Using  only  the 

Wj  t. ,  the  time  dependent  form  of  Eq.  22  becomes 

»  j0+i)>:i°g(u)yt(u)-Eiog(u)E^(u)  1 1 

_  2  ^  ( Ji  -  Jo  +  1)  £  l°g2(u)  -  (E  log(T-y))2 

where  all  sums  are  over  j  =  Jo, . . .  ,  J\  and 

Yt(Tj)  =  log (Wlt.)  -  -  log(2).  (26) 

To  decrease  the  variability  of  the  estimates,  the  level  J\  is  set  to  be  as  high  as  possible. 

Figure  9  shows  the  awise,t  (smoothed  with  a  moving  average  filter)  for  the  entire  turbulence  time  series  over  scales 
T5  _  r?  The  smoothed  a  Jiset  follow  the  same  patterns  exhibited  by  d5j7  shown  in  Fig.  8  and  11  with  a  bit  more 
variability  in  the  estimates.  These  variabilities  are  not  captured  by  the  block  dependent  estimators  and  illustrates  the 
importance  of  using  time  dependent  estimators  for  a  more  accurate  portrayal  of  the  (turbulence)  dynamics. 


Time  (min) 


Figure  9:  Unbiased  Swise,t  for  the  entire  7.5  million  point  turbulence  record.  A  moving  average  filter  was  used  to 
smooth  the  results  using  nonoverlapping  10,000  point  windows. 


5.3.3  Block  dependent  maximum  likelihood  estimator 

Wavelet-based  maximum  likelihood  techniques  can  be  used  in  harmony  with  an  FDP  model  as  another  means  of 
obtaining  estimates  for  FDP  parameters.  Suppose  X  =  {Xt}t=0  can  be  regarded  as  a  portion  of  a  zero  mean  stationary 
FDP  with  unknown  parameters  <5  and  a \  >  0.  Assuming  that  X  obeys  a  multivariate  Gaussian  distribution,  we  can 
estimate  6  and  by  maximizing  the  likelihood  function 


£(«5,<t£2|X) 


1  — XtS-‘X/2 

(2tt)"/2|£xI1/2 


(27) 


where  Ex  is  the  covariance  matrix  of  X,  and  |EX|  is  the  determinant  of  Ex.  Note  that  the  dependence  of  the  likelihood 
function  on  5  and  a]  is  through  Ex  alone.  In  a  practical  setting  however,  evaluation  of  £(<5,aJX)  is  computationally 
expensive  even  for  moderate  N  [4].  Secondly,  numerical  instabilities  can  occur  when  calculating  the  likelihood  function 

An  approximation  to  the  likelihood  function  can  be  calculated  using  DWT  coefficients.  Using  the  DWT  is  ad¬ 
vantageous  in  that  it  is  known  to  decorrelate  (long  memory)  FDP  and  related  processes,  forming  a  near  mdependen 
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Gaussian  sequence,  and  simplifying  the  statistics  significantly  [6].  The  decorrelation  property  allows  us  to  form  a 
reduced  log  likelihood  function  without  much  computational  burden: 

j 

l(S,d2(S)  |X)  —  N  =  N\og(a2(6))  +  log(C#J+1(«))  +  log(<?;(<*))  (28) 

i 


where 


(6)  = 


(29) 


and  C'  (6)  is  the  average  SDF  value  at  the  mid-frequency  of  the  band  corresponding  to  DWT  scale  t3  (see  [22]  for  explicit 
details  on  the  development  of  the  reduced  log  likelihood  function  using  the  DWT  coefficients).  For  our  purposes,  the 
mid-octave  values  are  calculated  using  the  SDF  defined  by  Eq.  1.  Minimizing  Eq.  28,  which  is  strictly  a  function  of  6, 
yields  the  maximum  likelihood  estimates  of  v2mU  and  5mie- 

The  variance  of  <5mie  can  be  calculated  as  follows:  let  W,  be  an  M  =  £\  M,  point  vector  containing  all  of  the 
interior  DWT  wavelet  coefficients.  Assuming  that  the  coefficients  in  Wr  are  uncorrelated  and  that  6  <E  [-1/2, L/2], 
then  for  large  N  the  estimator  <5m(e  is  approximately  Gaussian  distributed  with  mean  <5  and  variance 


(30) 


where 


(f>j  — 


4<t; 


var{Wj>t} 


l 


1/2  7>  mlog(2sin(7r/)) 
[2sin(7r/)]w 


(31) 


The  term  var{ Wjit}  is  the  process  wavelet  variance  defined  as 

ri/2  rV2  a  2 

var{W^,t}  =  2  Uj,L(f)Sx(f)  df  =  2  [2  sin(^-/)]2<5  32 

In  practice,  the  integrals  in  Eq.  31  and  32  can  be  approximated  through  numerical  integration  or,  based  on  the  view 
that  the  wavelet  transform  forms  an  octave  band  decomposition  (see  Fig.  1),  through  a  Taylor  series  expansion  about 
the  mid-octave  frequencies  for  levels  j  =  1,2  and  direct  integration  using  a  small  angle  assumption  for  j  >  2.  There 
is  generally  a  large  increase  in  computational  speed  when  using  the  bandpass  approach  with  a  relatively  small  loss  of 

Figures  10  and  11  show  the  maximum  likelihood  estimates  of  a  for  the  segmented  turbulence  data  set  described 
in  Sec.  5.1.  The  am| e  compare  quite  well  with  the  awUe  (see  Fig.  7  and  8)  with  the  exception  of  the  &i,3  and  Qs,io 
estimates.  The  difference  in  d1>3  is  directly  attributed  to  the  difference  in  models.  The  weighted  least  squares  estimates 
are  based  upon  a  pure  power  law  model  while  the  maximum  likelihood  estimates  are  based  on  an  FDP  whose  SDF 
adheres  to  a  power  law  for  /  <  1/8,  but  diverges  from  this  behavior  for  1/8  <  f  <  1/2,  corresponding  to  wavelet 
scale  n  and,  to  a  lesser  extent,  t%.  Figure  12  demonstrates  the  departure  of  an  FDP  SDF  from  that  of  a  pure  power 
law  process.  The  difference  in  q8>io  estimates  is  mainly  due  to  a  high  degree  of  sampling  variability  in  the  wavelet 
variance  estimates  at  long  scales,  in  particular,  at  scale  Tio  the  number  of  DWT  wavelet  coefficients  is  diminished  to 
Nio  =  104/210  «  10,  i.e.  a  very  small  set  of  coefficients  to  obtain  the  amie. 


5.3.4  Block  independent  maximum  likelihood  estimator 

The  MODWT  coefficients  can  be  circularly  permuted  to  form  an  approximate  zero  phase  decomposition  using  Daubechies’ 
least  asymmetric  or  Coiflet  filters.  Using  the  permuted  coefficients  a  reduced  log-likelihood  function  can  be  min¬ 

imized  to  form  an  “instantaneous”  MLE  of  the  FDP  parameter  6.  The  reduced  likelihood  function  consists  of  a 
collection  of  wavelet  coefficients,  one  from  each  scale,  that  are  colocated  in  time  and  is  defined  as 

um  =  +x>g(c'(<5)).  (33) 
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Figure  10:  Maximum  likelihood  estimates  of  the  pure  power  law  exponent  a  over  scales  n  -r7  for  the  turbulence  data. 
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Figure  11:  Maximum  likelihood  estimates  of  a  over  scales  75  —  T\ 0  for  the  turbulence  data. 

This  approach  is  useful  for  processes  that  exhibit  a  variation  in  power  law  behavior  over  time.  Figure  13  shows 
an  example  of  unbiased  6 for  a  small  segment  of  aerothermal  turbulence  data.  The  results  indicate  a  modest 

fluctuation  of  about  a  mean  value  of  6mie,t  =  1-04  corresponding  to  a  near-red  noise  amu,t  ==  -2.08.  The 

estimated  density  function  (Fig.  13(b))  and  quantile-quantile  (Q-Q)  plot  with  respect  to  a  normal  density  (Fig.  13(d)) 
both  indicate  a  near  Gaussian  distribution  of  the  estimates.  The  “S”  shape  in  the  Q-Q  plot  implies  that  the 

density  function  of  the  <5 m;e,t  has  longer  tails  than  does  a  Gaussian. 

5.3.5  Discussion  of  FDP  parameter  estimations 

In  this  report  we  have  introduced  four  wavelet  based  techniques  to  estimate  FDP  model  parameters  for  aerothermal 
turbulence  data:  time  (in)dependent  weighted  least  squares  estimators  and  time  (in)dependent  maximum  likelihood 
estimators.  For  the  time  independent  techniques,  results  were  calculated  for  a  7.5  million  point  aerothermal  data  set 
taken  in  contiguous  nonoverlapping  10, 000  point  intervals.  The  time  independent  results  verify  the  presence  of  time 
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Figure  12:  Spectral  density  functions  for  an  FDP  realization  with  S  =  1  and  a  pure  power  law  process  with  a  =  -2. 
The  vertical  lines  represent  the  division  in  octaves  corresponding  to  various  wavelet  scales. 
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Figure  13:  Unbiased  for  a  sample  segment  of  aerothermal  data.  Only  the  scales  n  -  r4  were  used  for  this 

estimation.  Shown  in  the  figure  are  (a)  6mle,t  with  95%  confidence  intervals,  (b)  the  corresponding  estimated  probability 
density,  (c)  a  smoothed  (low  pass  filtered)  version  of  <5mie,t  with  confidence  intervals  and  (d)  a  Q-Q  plot  of  the  Smie,t 
with  respect  to  a  normal  distribution. 


varying  power  law  processes  with  a  power  law  exponent  spanning  stationary  blue  noise  to  nonstationary  red  noise  and 
applicable  over  finite  and  distinct  ranges  of  scale.  The  time  dependent  results  were  calculated  for  a  small  (4096  point) 
aerothermal  data  segment.  These  “instantaneous”  estimators  are  very  useful  for  detection  of  changes  in  system  whose 
dynamics  fluctuate  rapidly  as  a  function  of  time  or  scale.  For  time  independent  estimates,  we  introduce  methods  for 
calculating  the  variance  of  FDP  parameters,  that,  under  a  chi-squared  distribution  assumption  made  on  the  variance 
of  the  wavelet  coefficients,  can  be  related  to  confidence  intervals. 

We  showed  in  our  research  that  in  order  to  provide  consistent  results  between  maximum  likelihood  and  weighted 
least  squares  estimates  of  the  FDP  parameter  <5,  one  needs  to  be  aware  of  the  divergence  of  the  FDP  spectral  density 
function  from  that  of  a  pure  power  law  process  at  high  frequencies  (small  scales).  As  a  course  of  future  research  we 
intend  to  augment  an  FDP  with  an  ARMA  model  for  small  scale  variations,  providing  more  flexibility  than  either  a 
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pure  power  law  or  FDP  model  alone. 

The  collection  of  results  supports  the  efficacy  and  validity  of  using  stochastic  FDP  models  for  aerothermal  turbulence 
data.  The  use  of  wavelet  techniques  allows  us  to  examine  the  apparent  self-similar  behavior  of  turbulence  over  distinct 
and  finite  ranges  of  scale,  and,  when  used  with  approximate  linear  phase  wavelet  filters,  provides  an  effective  means 
of  characterizing  time  varying  power  law  processes. 

5.4  Wavelet-based  forecasting  for  non-stationary  multi-scale  fractal  processes 

5.4.1  Nonlinear  deterministic  modeling  of  wavelet  subband  processes 

To  complement  stochastic  subband  modeling,  we  assessed  the  efficacy  of  deterministic  nonlinear  modeling  for  aerother¬ 
mal  turbulence  data.  Figure  14  suggests  one  possible  path  to  take  for  modeling  and  prediction  of  subband  processes 
via  wavelet  techniques.  The  original  time  series  is  fed  into  (a  form  of)  the  discrete  wavelet  transform  to  produce  the 
subband  processes  Wjit-  A  portion  of  the  Wjtt  coefficients  are  used  for  training  the  prediction  algorithm  based  on 
stochastic  and  deterministic  models  to  produce  a  predicted  set  of  coefficient.  These  predicted  coefficients  are  then 
augmented  with  the  training  set  and  an  inverse  wavelet  transform  performed  to  yield  a  predicted  time  series.  The 
hypothesis  is  that  the  inverted  series  will  have  better  predictive  power  than  those  techniques  based  on  the  original 
series  alone.  This  flexible  scheme  is  relevant  to  analysis  of  turbulence  data  in  three  important  ways: 

1.  Wavelet  transforms  can  be  used  to  decompose  a  complex,  possibly  nonstationary  and  multi-scale  series  into  a 
few  crystals  (a  ‘crystal’  is  a  set  of  wavelet  coefficients  at  a  particular  scale),  each  of  which  has  its  own  dynamical 
and  statistical  behavior,  and  different  prediction  techniques  can  be  applied  separately  to  each  crystal. 

2.  The  implicit  differencing  operations  of  wavelet  filters  can  be  used  to  analyze  nonstationary  sequences  with 
stationary  backward  differences.  The  stationarity  of  the  wavelet  coefficients  provides  a  means  of  generating 
useful  statistics  on  the  data. 

3.  Using  approximate  linear  phase  wavelet  and  scaling  filters,  the  nondecimated  versions  of  the  wavelet  transform 
(MODWT,  MODWPT)  can  be  used  to  meaningfully  colocate  the  wavelet  coefficients  with  events  in  the  original 
series  (an  important  feature  for  the  modeling  of  cross-scale  components). 


Figure  14:  Flow  chart  for  subband  modeling  and  prediction. 

or  phase  I,  we  implemented  the  scheme  outlined  in  Fig.  14  using  nonlinear  dynamic  models  on  preselected  subbands  of 
a  MODWT  decomposition.  In  nonlinear  dynamics,  a  quick  means  of  detecting  deterministic  structure  in  a  time  series 
is  through  lag  plots,  produced  by  plotting  a  time  series  versus  a  shifted  version  of  itself.  We  refer  to  the  data  in  these 
plots  as  being  embedded.  A  “fuzzy”  or  cloudy  clustering  of  points  in  the  embedding  suggests  the  presence  of  either  a 
high  dimensional  deterministic  structure  or  a  stochastic  structure  in  the  data.  Conversely,  well  defined  patterns  in  the 
embeddings  suggest  an  underlying  deterministic  structure.  Figure  15  shows  sample  lag  plots  for  three  wavelet  crystals. 
Deterministic  structure  can  be  seen  in  the  embeddings  of  the  d$  and  dj  crystals  while  a  stochastic  clutter  is  seen  in  the 
embedding  for  <fx;  illustrating  the  need  for  a  combination  of  stochastic  and  deterministic  modeling  in  the  sub  bands. 

As  a  preliminary  investigation,  we  applied  the  prediction  technique  outlined  in  Fig.  14  to  a  3000  point  sunspots 
series  which  is  known  to  have  strong  deterministic  components  mixed  with  colored  noise.  We  isolated  the  deterministic 
cyclical  components  from  the  noise  via  a  MODWT  of  the  data.  We  then  hypothesized  that  the  cyclical  variations  were 
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Figure  15:  Time  histories  and  phase  plane  embeddings  of  MODWT  wavelet  crystals  dl,  d5,  and  d.7.  The  data  is  a 
3000  point  subset  of  the  aerothermal  turbulence  data. 


well  modeled  by  a  nonlinear  deterministic  system  and  used  a  relevant  nonlinear  technique,  the  local  Lyapunov  exponent 
(LLE)  prediction  method  developed  by  one  of  us  [14,  15, 13],  to  predict  the  latter  third  of  the  of  the  data  using  the  first 
two  thirds  as  a  training  set.  The  LLE  technique  was  applied  to  crystals  which  displayed  deterministic  patterns  in  the 
phase  plane.  For  prediction  within  each  “stochastic  crystal” ,  the  sample  mean  of  training  set  was  extended  as  an  ersatz 
prediction  of  future  values.  This  allowed  us  to  independently  test  the  prediction  capabilities  of  the  LLE  in  the  context 
of  wavelet  subband  processing.  The  predicted  wavelet  coefficients  were  augmented  with  the  training  coefficients,  and 
the  inverse  MODWT  applied.  We  refer  to  the  predicted  time  series  as  a  local  Lyapunov  wavelet  synthesis  (LLWS). 
For  purposes  of  comparison,  a  commonly  known  nonlinear  technique  known  as  nearest  neighbor  (NN)  prediction,  was 
applied  to  the  original  time  series.  Preliminary  results  indicate  that  the  LLWS  approach  achieves  better  prediction 
accuracy  than  the  traditional  NN  algorithm  as  the  magnitudes  and  phases  of  the  sunspots  series  are  better  fit  by  the 
LLWS.  The  NN  prediction  of  the  original  series  (only)  fails  to  predict  the  high  amplitude  sunspot  variations  (Fig.  16). 
The  smoothness  of  the  LLWS  of  sunspots  data  is  simply  due  to  the  fact  that  a  constant  was  used  for  the  predicted 
coefficients  in  noisy  subbands.  An  obvious  improvement  to  the  technique  would  be  to  apply  a  reasonable  stochastic 
model  to  the  noisy  wavelet  crystals.  In  phase  II,  we  plan  to  refine  the  technique  by  testing  an  ensemble  of  stochastic 
and  deterministic  subband  models  during  the  remainder  of  the  phase  I  contract. 

Based  on  the  results  shown  in  Fig.  15  and  16,  we  suggest  that  it  is  feasible  to  use  nonlinear  deterministic  prediction 
in  the  context  of  wavelet  subband  modeling.  The  efficacy  of  the  prediction  is  a  direct  function  of  the  level  of  noise 
in  the  data.  We  therefore  suggest  that  this  technique  would  be  most  effective  for  large-scale  variations  present  in 
turbulence  time  series. 


6  Estimates  of  Technical  Feasibility 

In  order  to  address  the  technical  feasibility  of  our  work  in  phase  I,  we  summarize  the  relationship  with  future  research 
and  development,  discuss  the  commercial  potential  of  the  product  into  the  current  S+ Wavelets  module  and  a 
(possible)  Matlab  toolbox.  We  close  with  a  company  mission  and  product  line  and  summarize  our  commercialization 
record. 


6.1  Relationship  with  Future  Research  and  Development 

Our  phase  I  research  has  resulted  in  new  ‘blocked’  and  ‘local’  schemes  for  estimating  the  parameters  of  a  non-stationary 
multi-scale  fractal  process  and  for  attaching  confidence  intervals  to  some  -  but  not  all  -  of  these  estimates.  We  have 
implemented  our  new  methodolgy  in  the  form  of  S-Plus  functions.  We  have  used  these  functions  to  demonstrate  the 
usefulness  of  this  methodolgy  for  studying  the  time- varying  properties  of  an  aerothermal  time  series  provided  to  us 
by  our  sponsors.  Our  phase  I  research  has  thus  established  the  foundation  for  our  phase  II  efforts,  in  which  we  wi 
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Figure  16:  Prediction  results  the  sunspots  data  (dots)  using  LLWS  prediction  on  selected  wavelet  subbands  (solid 
thick  lines)  and  nearest  neighbor  prediction  on  the  original  series  (solid  thin  lines). 


extend  the  methodology  to  make  it  applicable  to  a  wider  range  of  problems  and  create  a  library  of  commerical-grade 

C  routines.  . 

Our  phase  II  research  will  extend  the  phase  I  research  by  completing  the  statistical  theory  behind  our  wavelet- 

based  estimators.  In  phase  II  we  will  also  improve  the  computational  efficiency,  robustness  and  portability  of  our 
computer  code  by  rewriting  our  algorithms  as  C  routines.  In  addition  we  will  create  code  for  a  much  wider  collection 
of  algorithms  that  can  be  of  use  in  modelling  fractal  processes.  We  will  also  provide  our  Air  Force  sponsors  with  code 
in  whatever  form  they  deem  best  to  allow  them  to  make  use  of  our  research  results  in  their  own  work  (most  likely 
either  in  the  form  of  a  C  library  or  as  a  Matlab  toolbox  with  an  appropriate  GUI  to  help  guide  the  non-expert  analyst 

on  the  use  of  the  methodology).  .  , 

The  phase  II  research  will  lay  the  groundwork  for  the  development  of  commercial-grade  software  toolkits  in  both 
S-Plus  and  Mathcad  for  wavelet-based  analysis  of  time  series  exhibiting  fractal  properties.  In  addition,  the  training 
course  that  we  develop  on  analyzing  non-stationary  multi-scale  fractal  processes  can  be  turned  into  documentation 
suitable  for  use  with  these  toolbox. 

6.2  Commercial  Potential 

A  primary  strength  of  this  proposal  is  a  realistic  and  credible  commercial  plan.  The  heart  of  the  plan  is  develop  state- 
of-the-art  software  products  which  will  appeal  to  a  broad  audience  of  users.  MathSoft  is  a  market  leader  in  wavelet  and 
advanced  statistical  technology.  MathSoft  also  has  an  impressive  record  on  commercialization  of  Government  funded 

0g  0  2^  J*  , 

Here  we  present  our  specific  commercialization  plans  (section  6.2.1),  give  background  on  our  company  (section  6.3) 
and  its  past  record  of  commercializing  government-sponsored  research  (section  6.4). 

6.2.1  Specific  Commercialization  Plans 

Our  research  will  lead  to  phase  III  development  of  a  user-friendly  S-Plus  and  Mathcad  toolboxes  for  analysis  of  frac¬ 
tal  processes,  a  library  of  portable  and  reusable  software  tools,  and  short  courses  with  an  associated  book/hypermedia 
document  that  will  guide  practitioners  in  the  analysis  of  multi-scale  fractal  processes.  In  commercialization  of  the 
fractal  processes  toolkits,  we  will  be  guided  by  our  experience  in  developing  and  marketing  wavelets  technology, 
which  resulted  in  the  S+WAVELETS  module  for  S-Plus,  a  MATHCAD  Wavelets  extension  pack,  a  book  published  by 
Springer- Verlag,  educational  courses  taught  to  industry  and  research  laboratories,  and  consulting  opportunities  in 
industry  (e.g.,  cellular  telephone  fraud  detection).  Because  the  underlying  software  will  be  based  on  a  well  organized 
set  of  C  libraries,  we  will  be  able  to  incorporate  these  tools  into  more  than  one  product.  Marketing  the  toolkits  on 
more  than  one  platform  is  critical  for  successful  commercialization  since  the  toolkits  individually  target  relatively  small 
markets. 
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Our  specific  commercialization  plans,  roughly  in  order  of  priority,  are  as  follows. 

S-Plus  Fractal  Processes  Toolbox:  The  primary  development  that  will  follow  in  phase  III  from  the  proposed 
research  will  be  a  toolkit  based  on  the  collection  of  C  routines  developed  in  phase  II.  We  will  embed  this  toolkit  within 
S-Plus  as  a  module  for  advanced  fractal  analysis.  S-Plus  is  an  ideal  platform  for  the  proposed  development,  with  a 
huge  existing  customer  base  including  technically  literate  data  analysts  from  a  variety  of  disciplines. 

Fractal  Processes  Extension  Pack:  A  second  product  which  would  naturally  evolve  from  our  research  plans  is  a 
Fractal  Processes  extension  pack  for  MathcadA  Mathcad  extension  pack  would  have  a  lower  price  point  than  an 
S-PLUS  toolbox  and  would  leverage  the  large  installed  base  of  Mathcad  (currently  over  1  million  users). 

Books  and  Hypermedia:  A  key  component  to  commercial  success  of  our  research  will  be  the  development  of  a  book 
linking  the  practice  and  theory  of  fractal  processes  analysis.  With  the  advent  of  multimedia  technology  and  the  World 
Wide  Web,  we  can  go  beyond  the  traditional  textbook  format  and  provide  a  state-of-the-art  interactive  hypermedia 
book  linked  to  the  Fractal  Processes  Toolbox.  The  book  will  contain  a  rich  collection  of  applications  drawn  from  our 
phase  II  and  other  research  efforts.  We  will  also  pursue  more  traditional  outlets  for  publication. 

Other  Vertical  Markets:  Full  realization  of  our  research  goals  will  make  analysis  of  fractal  processes  a  standard 
part  of  signal  processing.  We  see  substantial  opportunity  to  license  our  technology  to  specialized  software  systems 
dedicated  to  vertical  markets  where  fractal  signals  are  important,  including  biomedical  engineering  and  finance.  In 
addition,  if  we  embed  our  C  library  within  a  Matlab  toolbox  to  meet  the  requirements  of  our  sponsors,  we  will  seek 
avenues  for  marketing  this  toolkit. 

Teaching  and  Consulting:  To  supplement  the  software  products  and  books,  we  expect  to  offer  related  short  courses 
and  software  training.  Short  courses  in  signal  processing  oriented  towards  industry  and  government  are  routinely 
offered  by  various  groups.  A  large  part  of  the  Data  Analysis  Products  Division’s  revenues  results  from  training  classes. 
The  tuition  for  training  classes  and  short  courses  typically  exceeds  $1,000  per  participant.  Our  software  and  casebook 
will  also  serve  as  a  magnet  for  recruiting  consulting  projects.  The  Data  Analysis  Products  Division  is  rapidly  expanding 
its  consulting  group,  partly  as  a  result  of  its  high-end  research  efforts  (e.g.,  in  wavelet  analysis).  We  will  aggressively 
pursue  related  consulting,  particularly  in  areas  such  as  financial  engineering. 

6.3  Proposing  Company,  Mission  and  Main  Products 

The  research  and  software  development  for  this  project  will  be  done  primarily  at  the  facilities  of  the  Data  Analysis 
Products  Division  of  MathSoft,  Inc.  MathSoft  is  a  U.S.  owned  publicly  traded  small  business  (NASDAQ:  MATH) 
that  is  based  in  Cambridge,  Massachusetts,  and  specializes  in  mathematically  oriented  software.  The  Data  Analysis 
Products  Division  (DAPD)  specializes  in  statistical  software  products  and  includes  the  Research  Department.  Overall 
the  company  has  195  employees,  with  95  of  these  being  in  the  DAPD.  ...... 

The  Research  Department  has  over  36  employees  of  whom  approximately  28  have  Ph.D.’s  in  disciplines  such  as 
statistics,  electrical  engineering,  mechanical  engineering,  physics,  computer  science  and  applied  mathematics.  Research 
staff  actively  collaborate  with  researchers  at  universities  and  other  institutions.  These  connections  are  particularly 
strong  with  the  University  of  Washington. 

The  primary  mission  of  the  DAPD  is  to  develop,  market,  and  support  cutting  edge  scientific  computing  software 
environments  for  high-interaction  statistical  and  visual  data  analysis.  The  DAPD  offers  the  S-Plus  and  Axum 
product  line.  S-Plus  is  an  interactive  computing  environment  for  graphics,  data  analysis,  statistics  and  mathematical 
computing.  S-PLUS  is  a  super-set  of  the  S  object-oriented  language  and  system  designed  by  R.  Becker,  J.  Chambers 
and  A.  Wilks  at  AT&T  Bell  Laboratories,  Murray  Hill,  NJ,  for  which  Chambers  won  the  prestigious  1998  ACM 
Software  System  Award  (this  annual  award  carries  a  prize  of  $10,000  and  has  been  given  in  the  past  to  the  developers 
of,  e.g.,  UNIX,  TeX,  PostScript,  TCP/IP  and  the  World  Wide  Web).  The  DAPD  recently  released  S-Plus  4.0, 
incorporating  advanced,  object-oriented  graphics  into  the  S-Plus  environment.  S-Plus  4.0  was  recently  given  the 
Personal  Computer  World  Editor’s  Choice  Award.  Our  customer  base  represents  almost  every  major  industry,  with 
particular  strength  in  high-tech  manufacturing,  biotechnology,  engineering,  and  finance.  S-Plus  is  available  in  UNIX 
and  Windows  versions.  Our  product  STATSERVER  supports  distributed  access  to  S-Plus  over  internal  and  external 
networks.  Web  browsers  and  customized  applications  can  submit  S-PLUS  expressions  to  a  StatServer  and  retrieve 
the  results,  including  graphics.  StatServer  2.0  recently  won  a  prestigious  industry  award,  when  it  was  named  to  the 
Crossroads  A-list  by  Open  Systems  Advisors. 

The  flagship  product  of  MathSoft  is  Mathcad,  a  numeric  and  symbolic  mathematical  analysis  software  package. 
Mathcad  combines  natural  notation  equation  manipulation,  text  and  graphics  into  a  single  electronic  document. 
MathSoft  is  also  a  strong  played  in  the  education  market  with  its  award-winning  StudyWorks!  software  for  building 
skills  in  mathematics  and  science  for  high  school  students.  The  company  has  more  than  one  million  users  of  its  Math¬ 
cad,  StudyWorks!,  S-Plus,  StatServer  and  Axum  software  worldwide.  Users  include  technical  professionals 
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worldwide  at  more  than  90%  of  the  Fortune  1,000  companies  and  over  500  government  installations,  and  students  and 
faculty  at  over  2,000  colleges  and  universities. 

6.4  Commercialization  Record 

The  Data  Analysis  Products  Division  of  MathSoft  has  an  outstanding  record  in  the  commercialization  of  advanced  data 
analysis  technology.  Our  core  product,  S-Plus,  is  a  commercial  version  of  the  S  language  developed  in  the  research 
environment  of  Lucent  technologies.  In  fact,  the  DAPD  would  never  have  existed  if  it  were  not  for  our  abilities  to 
commercialize  data  analysis  software. 

The  DAPD  has  further  established  a  record  of  commercializing  advanced  data  analysis  software  developed  partially 
using  government  funds  under  the  SBIR  program  and  the  NASA  EOCAP  program.  In  the  past  four  years,  we  have 
completed  eight  Phase  II  SBIR  awards  and  one  NASA  EOCAP  award.  Partially  supported  by  these  awards,  we  have 
commercialized  and  shipped  six  products:  S+Wavelets,  S+API,  S+DOX,  S-Plus  for  ARC/INFO,  S+Spatial, 
and  S-Plus  for  ArcView  GIS.  Additional  products  are  under  development  based  on  SBIR-funded  work  in  signal  and 
image  processing,  group  sequential  analysis,  and  statistics  for  data  with  missing  values.  In  addition,  we  have  shipped 
a  Mathcad  Wavelets  function  pack  in  1998  based  on  technology  developed  for  S+Wavelets. 


7  Summary 

Each  of  the  objectives  in  Sec.  3  has  been  addressed  in  our  research  efforts  for  phase  I.  The  feasibility  of  this  project  is 
estimated  to  be  very  high  considering  (i)  the  large  number  of  functions  written  for  the  existing  S+Wavelets  module, 
(ii)  the  promising  results  of  these  functions  as  applied  to  real  aerothermal  turbulence  data,  (iii)  the  commercialization 
record  of  MathSoft,  Inc.  and  (iv)  the  demand  for  such  tools  in  other  areas  such  as  in  cardiodynamics,  (internet)  traffic 
modeling,  and  financial  time  series  analysis  to  name  a  few.  Through  partial  funding  by  this  STTR  phase  I  contract, 
our  research  has  lead  to  (2)  technical  reports,  (1)  conference  paper,  (2)  journal  articles  (pending  acceptance),  and  (1) 
book  on  wavelet  methods  for  time  series  analysis. 

We  would  like  to  thank  our  sponsors  for  funding  this  project  and  we  look  forward  to  our  cooperative  phase  II 
effort. 
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